Spatial Econometrics & Outlook
Stefan Jünger & Dennis Abel
2025-04-10
Now
April 09
10:00-11:30
Introduction
April 09
11:30-11:45
Coffee Break
April 09
11:45-13:00
Data Formats
April 09
13:00-14:00
Lunch Break
April 09
14:00-15:30
Mapping I
April 09
15:30-15:45
Coffee Break
April 09
15:45-17:00
Spatial Wrangling
April 10
09:00-10:30
Mapping II
April 10
10:30-10:45
Coffee Break
April 10
10:45-12:00
Applied Spatial Linking
April 10
12:00-13:00
Lunch Break
April 10
13:00-14:30
Spatial Autocorrelation
April 10
14:30-14:45
Coffee Break
April 10
14:45-16:00
Spatial Econometrics & Outlook
What are spatial econometrics?
Econometrics could be reduced to using statistics to model (complex) theories …
it is interesting for causal inference and thinking
as default we think about regression analysis
Therefore, spatial econometrics combine spatial analysis and econometrics
study of why spatial relationships (i.e., autocorrelation) exist
how spatial autocorrelation affects our outcome of interest
What is the data generation process?
Spatial diffusion vs. spatial spillover
There are at least two common mechanisms we are interested in spatial econometrics
Diffusion
\(y_i\) affects \(y_j\) through \(w_{ij}\)
\(y_j\) affects \(y_i\) through \(w_{ji}\)
that’s a feedback effect
Examples:
pandemic and policy measures to contain the pandemic
diffusion of violence in a war
Spillover - \(x_i\) affects \(y_j\) through \(w_{ij}\) - \(x_j\) affects \(y_i\) through \(w_{ij}\) - Examples: - spillover of economic strength and trade
Let’s have another look at our chessboard
We have to think about theories and mechanisms and how they translate into spatial effects and the data generation process.
That said, there are tests to check for the specific data generation process at hand, but they are not recommended to be used naively.
Is it meaningful or just nuisances?
Space can be important in our analysis in two ways.
it’s meaningful in our theory and we thus interpret it accordingly after estimation
it can distort our empirical estimates, producing bias, inconsistency, and inefficiency
We can address both of these different perspectives in our analysis with spatial econometric methods.
Flavors and extensions
Spatial Durbin Model:
\[Y = \rho WY + X\beta + WX\theta + \epsilon \]
Spatial Durbin Error Model:
\[Y = X\beta + WX\theta + u\] \[u = \lambda Wu + \epsilon\]
Combined Spatial Autocorrelation Model:
\[Y = \rho WY + X\beta + u\] \[u = \lambda Wu + \epsilon\]
Manski Model:
\[Y = \rho WY + WX\theta + X\beta + u\] \[u = \lambda Wu + \epsilon\]
Source:Tenor
Linear regression
linear_regression <-
lm (afd_share ~ immigrant_share + inhabitants, data = election_results)
summary (linear_regression)
Call:
lm(formula = afd_share ~ immigrant_share + inhabitants, data = election_results)
Residuals:
Min 1Q Median 3Q Max
-15.010 -3.397 -0.232 2.790 25.032
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.737242 0.579582 47.857 < 2e-16 ***
immigrant_share -0.097675 0.026150 -3.735 0.000207 ***
inhabitants -0.079595 0.003812 -20.879 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.843 on 540 degrees of freedom
Multiple R-squared: 0.4822, Adjusted R-squared: 0.4803
F-statistic: 251.4 on 2 and 540 DF, p-value: < 2.2e-16
Now we need a spatial weight
To estimate a spatial regression we, once again, have to construct a spatial weight as in the analysis of spatial autocorrelation. In fact, we’ll use the same approach as before.
queen_neighborhoods <- spdep:: poly2nb (election_results, queen = TRUE )
queen_W <- spdep:: nb2listw (queen_neighborhoods, style = "W" )
Spatial Error Model: If we want to control nuisance
spatial_error_model <-
spatialreg:: errorsarlm (
afd_share ~ immigrant_share + inhabitants,
data = election_results,
listw = queen_W
)
summary (spatial_error_model)
Call:spatialreg::errorsarlm(formula = afd_share ~ immigrant_share +
inhabitants, data = election_results, listw = queen_W)
Residuals:
Min 1Q Median 3Q Max
-9.45189 -2.38063 -0.41255 1.94994 25.74532
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 22.9287086 0.9506456 24.1191 < 2.2e-16
immigrant_share -0.0900839 0.0281413 -3.2011 0.001369
inhabitants -0.0333569 0.0046216 -7.2175 5.294e-13
Lambda: 0.764, LR test value: 217.73, p-value: < 2.22e-16
Asymptotic standard error: 0.03331
z-value: 22.936, p-value: < 2.22e-16
Wald statistic: 526.05, p-value: < 2.22e-16
Log likelihood: -1516.68 for error model
ML residual variance (sigma squared): 13.541, (sigma: 3.6798)
Number of observations: 543
Number of parameters estimated: 5
AIC: 3043.4, (AIC for lm: 3259.1)
Spatial Lag X Model: estimating spillovers
spatial_lag_x_model <-
spatialreg:: lmSLX (
afd_share ~ immigrant_share + inhabitants,
data = election_results,
listw = queen_W
)
summary (spatial_lag_x_model)
Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
data = as.data.frame(x), weights = weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.062e+01 6.777e-01 4.517e+01 3.153e-185
immigrant_share -7.937e-02 3.450e-02 -2.301e+00 2.178e-02
inhabitants -2.496e-02 5.910e-03 -4.223e+00 2.827e-05
lag.immigrant_share -1.358e-02 4.769e-02 -2.848e-01 7.759e-01
lag.inhabitants -8.721e-02 7.721e-03 -1.130e+01 1.073e-26
Spatial Lag Y Model: estimating diffusion
spatial_lag_y_model <-
spatialreg:: lagsarlm (
afd_share ~ immigrant_share + inhabitants,
data = election_results,
listw = queen_W)
summary (spatial_lag_y_model)
Call:spatialreg::lagsarlm(formula = afd_share ~ immigrant_share +
inhabitants, data = election_results, listw = queen_W)
Residuals:
Min 1Q Median 3Q Max
-10.1439 -2.2643 -0.2712 1.9608 24.2918
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 10.1883400 0.9871671 10.3208 < 2e-16
immigrant_share -0.0549475 0.0197113 -2.7876 0.00531
inhabitants -0.0329164 0.0034473 -9.5484 < 2e-16
Rho: 0.66913, LR test value: 261.29, p-value: < 2.22e-16
Asymptotic standard error: 0.035258
z-value: 18.978, p-value: < 2.22e-16
Wald statistic: 360.16, p-value: < 2.22e-16
Log likelihood: -1494.896 for lag model
ML residual variance (sigma squared): 13.024, (sigma: 3.6089)
Number of observations: 543
Number of parameters estimated: 5
AIC: 2999.8, (AIC for lm: 3259.1)
LM test for residual autocorrelation
test value: 20.507, p-value: 5.9418e-06
Comparison: What’s ‘better’?
AIC (spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)
df AIC
spatial_error_model 5 3043.360
spatial_lag_x_model 6 3146.285
spatial_lag_y_model 5 2999.792
spdep:: lm.LMtests (linear_regression, queen_W, test = c ("LMerr" , "LMlag" ))
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = afd_share ~ immigrant_share + inhabitants, data =
election_results)
test weights: listw
RSerr = 206.05, df = 1, p-value < 2.2e-16
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = afd_share ~ immigrant_share + inhabitants, data =
election_results)
test weights: listw
RSlag = 308.15, df = 1, p-value < 2.2e-16
Let’s stick to our theory, shall we?
Of higher importance: interpretation
Unfortunately, in case of a Spatial Lag Y Model the spatial parameter \(\rho\) only tells us that the effect is (statistically) significant – or not.
remember: these models are endegenous by design
we have effects of \(y_j\) on \(y_i\) and vice versa
what a mess
Luckily, there’s a method to decompose the spatial effects into direct, indirect and total effects: estimating impacts
Impact estimation in R
This time, let’s start with the Spatial Lag Y Model:
spatialreg:: impacts (spatial_lag_y_model, listw = queen_W)
Impact measures (lag, exact):
Direct Indirect Total
immigrant_share -0.06185718 -0.10421054 -0.16606773
inhabitants -0.03705569 -0.06242758 -0.09948327
Compare it to the ‘simple’ regression output:
coef (spatial_lag_y_model)
rho (Intercept) immigrant_share inhabitants
0.66912619 10.18833999 -0.05494746 -0.03291641
Spatial Lag X impacts
spatialreg:: impacts (spatial_lag_x_model, listw = queen_W)
Impact measures (SlX, glht):
Direct Indirect Total
immigrant_share -0.07937153 -0.01358185 -0.09295338
inhabitants -0.02495873 -0.08720666 -0.11216539
Compare it to the ‘simple’ regression output:
coef (spatial_lag_x_model)
(Intercept) immigrant_share inhabitants lag.immigrant_share
30.61564892 -0.07937153 -0.02495873 -0.01358185
lag.inhabitants
-0.08720666
If you need p-values and stuff
spatialreg:: impacts (spatial_lag_y_model, listw = queen_W, R = 500 ) |>
summary (zstats = TRUE , short = TRUE )
Impact measures (lag, exact):
Direct Indirect Total
immigrant_share -0.06185718 -0.10421054 -0.16606773
inhabitants -0.03705569 -0.06242758 -0.09948327
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
Direct Indirect Total
immigrant_share 0.02165998 0.037682955 0.05823011
inhabitants 0.00348887 0.008657481 0.01042944
Simulated z-values:
Direct Indirect Total
immigrant_share -2.881339 -2.779895 -2.870755
inhabitants -10.696953 -7.259307 -9.604310
Simulated p-values:
Direct Indirect Total
immigrant_share 0.0039599 0.0054376 0.0040949
inhabitants < 2.22e-16 3.8902e-13 < 2.22e-16
This week
April 09
10:00-11:30
Introduction
April 09
11:30-11:45
Coffee Break
April 09
11:45-13:00
Data Formats
April 09
13:00-14:00
Lunch Break
April 09
14:00-15:30
Mapping I
April 09
15:30-15:45
Coffee Break
April 09
15:45-17:00
Spatial Wrangling
April 10
09:00-10:30
Mapping II
April 10
10:30-10:45
Coffee Break
April 10
10:45-12:00
Applied Spatial Linking
April 10
12:00-13:00
Lunch Break
April 10
13:00-14:30
Spatial Autocorrelation
April 10
14:30-14:45
Coffee Break
April 10
14:45-16:00
Spatial Econometrics & Outlook
What’s left
Other map types such as
cartograms
hexagon maps
(more)animated maps
network graphs
GIS techniques, such as
geocoding
routing
cluster analysis
More Advanced Spatial(-temporal) Modeling
More data sources…
Data Sources
Some more information:
geospatial data are interdisciplinary
amount of data feels unlimited
data providers and data portals are often specific in the area and/or the information they cover
Some random examples: